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A GENERALIZATION OF THE LINDEBERG PRINCIPLE 1 

By Sourav Chatterjee 

University of California, Berkeley 

We generalize Lindeberg's proof of the central limit theorem to 
an invariance principle for arbitrary smooth functions of independent 
and weakly dependent random variables. The result is applied to 
get a similar theorem for smooth functions of exchangeable random 
variables. This theorem allows us to identify, for the first time, the 
limiting spectral distributions of Wigner matrices with exchangeable 
entries. 

1. Introduction and results. J. W. Lindeberg's elegant proof of the cen- 
tral limit theorem [12] , despite being in the shadow of Fourier analytic meth- 
ods for a long time, is now well known. It was revived by Trotter [20] and has 
since been successfully used to derive CLT's in infinite-dimensional spaces, 
where the Fourier analytic methods are not so useful. 

While the original Lindeberg method and its extensions compare the dis- 
tributions of convolutions in great generality (the history of which is irrel- 
evant to our discussion, so we refer to [15] for details), it soon becomes 
clear that the same principle works not only for sums, but for more general 
smooth functions as well. Comparison of f(X\, . . . , X n ) and f{Y\, ■ ■ ■ , Y n ) 
for polynomial / has been examined by Rotar [16] and Mossel, O'Donnell 
and Oleszkiewicz [13], and for general smooth / with bounded derivatives 
by Chatterjee [5]. In [5], it is shown how to apply the method to estab- 
lish universality in physical models, including the Sherrington— Kirkpatrick 
model of spin glasses. It was recently observed by Toufic Suidan [18] that 
the results in [5] can be used to give an immediate proof of the universality 
of last passage percolation in thin rectangles (originally a result of [3] and 
[4]). Developed independently, the Mossel, O'Donnell and Oleszkiewicz pa- 
per [13] is another repository of very striking modern applications of this 
very old idea. 
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A closer examination of Lindeberg's method reveals that there is a di- 
rect generalization by which the independence of the coordinates can be 
dispensed with. The argument, which may be possible to guess once the 
theorem has been stated, will be given in Section 2. 

Theorem 1.1. Suppose X and Y are random vectors in M. n with Y 
having independent components. For 1 <i <n, let 

A i :=E\E(X i \X 1 ,...,X i - 1 )-E(Yi)\, 

B t := E|E(X?|Xi, . . . , AVi) - E(Y?)\. 

Let M3 be a bound on maxj(E|Xj| 3 +E|Y;| 3 ). Suppose f : K n — ► R is a thrice 
continuously differentiate function, and for r = 1,2,3, let L r (f) be a finite 
constant such that |3[/(x)| < L r (f) for each i and x, where d\ denotes the 
r-fold derivative in the ith coordinate. Then 

n 

|E/(X) - E/(Y)| < Y^i^Liif) + \BiL 2 {f)) + lnL 3 (f)M 3 . 

i=l 

Let us now say a bit about the condition of boundedness of third deriva- 
tives. The implications of this condition have been inspected in detail in 
the context of convolutions by Zolotarev (see, e.g., [23]) and other authors. 
Zolotarev defines the (^-metric on the space of distributions as follows: 
(s(F,G) = supj|/ fdF — J fdG\, where the sup is taken over all / with 
third derivative bounded by 1. The £3 metric has not been so popular in 
practice because of the difficulty in connecting this metric with the common 
notions of distance between measures. 

However, instead of taking supremum over a class of /'s, we consider 
only individual functions of interest. For instance, in the random matrix 
scenario, our / will be the Stieltjes transform of a matrix at a fixed z S C\R, 
which is a nice C°° function of the original matrix. In the paper [5], the 
author considered the partition function of a disordered physical system (the 
Sherrington-Kirkpatrick model of spin glasses), which again turns out to be 
a C°° function of the disorder matrix, and has nicely bounded derivatives. 

The condition of boundedness of the derivatives can be dropped (as 
demonstrated in [16] and [13]) by careful examination of the remainder term; 
but our focus is different: We are more concerned with ways to extend the 
method to the case of weakly dependent variables (as in Theorem 1.1 above), 
and more specifically, to exchangeable random variables, as below. 

Exchangeable random variables. Let us now present a surprisingly non- 
trivial application of the basic tool developed in the previous section. Sup- 
pose X is a vector with exchangeable components. Certainly, we cannot 
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expect to replace the components of X by independent Gaussians as we did 
in Theorem 1.1. For instance, all the components may be equal to the same 
random variable, in which case there is no hope of replacing these variables 
by something generic. However, not all is lost; our next theorem shows that 
the following "summarization" of X can still be carried through: 

Suppose X is a vector with exchangeable components, having finite fourth 
moments. Let 

, n n 

(1) A:=~E^ and ^^-E^-A) 2 - 

n f— i n . 1 

i=i i=i 

Let Z be a standard Gaussian vector in W 1 , independent of X. Let Z := 
i£?=i^and 

(2) Yi :=fj, + a(Zi - Z), i = l,...,n. 

Then, for sufficiently well-behaved / (to be described below), we have E/(X) ~ 
E/(Y). That is, X can be "replaced" by the modified vector Y for evalua- 
tion under suitably smooth /. Note that in the process, we summarized the 
random vector X into the couple (/t,^). The precise statement is as follows: 



Theorem 1.2. Suppose X is a random vector with exchangeable com- 
ponents, and fx, a and Y are defined as in (1) and (2). Let f :R n — > ]R be a 
thrice continuously differentiate function, and for r = 1,2,3, let L' r (f) be a 
uniform bound on all rth partial derivatives of f , including mixed partials. 
For each p, let m p = E| Xi — fi\ p . Then we have the bound 

|E/(X) - E/(Y)| < g.bml^L^n 1 / 2 + 13m 3 4(/)n. 

We postpone the (somewhat long) proof of this theorem until Section 3, 
giving only a brief sketch at this point. The first step is to show that there 
is no loss of generality in assuming that fi = and a = 1 . Having assumed 
that, if we define 

1 i " 1 
n-i + l^ J ' 

then it is a straightforward exercise (which we will work out, neverthe- 
less) that E(i2j|^ r j_i) = 0, where T%-\ is the sigma-algebra generated by 
X\, . . . ,Xi-\. The next step is to prove 

E(i^|^i_i) = 1 + 0((n - i + l)" 1 / 2 ), 

which is computationally slightly harder. Having established these approxi- 
mations, we can now replace Ri's by independent Gaussian variables V\,...,V r 
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using Theorem 1.1. However, the inverse transform, which takes R to X, 
does not take V to Y or anything resembling Y, but to something that is 
close to Y in distribution. This will be formalized using Gaussian interpo- 
lation techniques. 

Before moving on to applications, let us quickly mention without detailed 
justification that Theorem 1.2 has no apparent connection with de Finetti's 
theorem [6] or its finite version due to Diaconis and Freedman [8]. 

An application: Wigner' s law for exchangeable random variables. Let 
us begin with a very quick introduction to some necessary material from 
random matrix theory. 

Spectral measures. The empirical spectral distribution (ESD) of an N x 
N square matrix A is the probability distribution -k J2iL\ ^ , where Ai, . . . , Ajy 
are the eigenvalues of A repeated by multiplicities, and 6 X denotes the point 
mass at x. The weak limit of a sequence of ESDs is called the limiting 
spectral distribution (LSD) of the corresponding sequence of matrices. The 
existence and identification of LSDs for various kinds of random matrices is 
one of the main goals of random matrix theory. For a Hermitian matrix, the 
ESD is supported on the real line and hence has a corresponding cumulative 
distribution function. We will denote the c.d.f. for the ESD of a Hermitian 
matrix A by Fa- Explicitly, Fa(x) = jj#{i ■ Aj < x}. 

Wigner matrices. A standard Gaussian Wigner matrix of order N is 
a matrix of the form An = (A r ~ 1 / 2 Xj J )i<jj<7v, where (-Xjj)i<i<j<iv is a 
collection of i.i.d. standard Gaussian random variables, and Xij = Xji for 
i > j. Wigner [21] showed that the LSD for a sequence of standard Gaussian 
Wigner matrices (with order N — > oo) is the semicircle law, which has density 
(27t)~ 1 v / 4-x 2 in the interval [-2,2]. 

It was later shown that the distribution of the entries does not play a 
significant role; convergence to the semicircle law holds under more gen- 
eral conditions (cf. [1, 2, 10]). The weakest known condition was given by 
Pastur [14]. It is claimed that the condition was shown to be necessary by 
Girko [9] . Although most conditions require independence of the entries on 
and above the diagonal, there have been some advances (e.g., [7, 17]) allow- 
ing certain kinds of dependence. However, none of these cover the case of 
exchangeable entries. 

For a detailed exposition of the key results about the spectra of Wigner 
matrices and other results in the study of the spectral behavior of large 
random matrices, see [2] or [11]. 

Here we consider the question of identifying the limiting spectral distribu- 
tions for Wigner matrices with exchangeable entries. The following theorem 
gives a precise answer under minimal assumptions. 
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Theorem 1.3. Suppose that for each N we have a random matrix An = 
(A r_1 / 2 X^ r )i<jj<7v, where the collection (-X - /^ )i<i<j<jv is exchangeable and 
X{J = X^ for'i>j. Let 

^ := N(N + 1) Jt** md UN := N(N + 1) ~ /i7v) • 

i<J<N K ' i<3<N 

Assume that &n > a.s. for all N and sup N>1 E| (Xj^ — /Utv)/o"at| 4 < oo. 
Then the empirical spectral distribution of a N l A^ converges weakly to the 
semicircle law in probability. 



We will derive the above result from a quantitative bound (Lemma 4.1) 
on the difference between Stieltjes transforms (to be discussed in Section 4). 
The proof will show that it is actually enough to assume the weaker condition 
that E| (X£[ -fi N ) I on I 4 = o(N 2 / 3 ) as N -> oo. It will also be evident that the 
argument can be adapted to more complicated exchangeability assumptions 
than the most basic one assumed above. 



2. Proof of Theorem 1.1. Throughout the remainder of this article, we 
will use the notation dif instead of the more familiar Similarly, we will 

write d{djf instead of q®.q x and so on. 

Now let us begin with the proof. Without loss of generality, we can as- 
sume that X and Y are defined on the same probability space and are 
independent. For each i, 0<i<n, let 

Z i = {X 1 ,...,X i ,Y i+1 ,...,Y n ) and Z? = (X 1} . . . ,X^ U 0, Y i+1 , . . . , Y n ). 

Then clearly 

n 

E/(X) - E/(Y) = £(E/(Z,) - E/(Zi_i)). 

i=i 



Now, by third-order Taylor approximation 

r0\ v ff7^> 



m) - fW) - x, difW) - Q aff(zt 



< 



\Xj\ 3 L 3 (f) 
6 



and similarly, 



Yf 



/(Zi_!) - /(z?) - W(z°) - -fdff(zl 



< 



\Yj\ 3 L 3 (f) 
6 



Now, since the Yj's are independent, we have 

E((Xi - Yi) dtfffi)) = E((E{Xi\X u . . . , Xi-i) - E(Fi)) $/(Z?)). 
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Similarly, we also have 

E((X? - if) a?/(Z?)) = E((E(X 2 |X, . . . , avo - E(y, 2 )) a?/(z?)). 
Thus, for any i, 
|E/(Zi)-E(Zi_i)| <iL 3 (/)(E|X i | 3 +E|y i | 3 ) 

+ |E((X - rO ft/(Z?))| + ||E((Xf - Y?)d? /(Z°))| 
< iL 3 (/)M 3 + AiLi(f) + \B t L 2 {f). 
This completes the proof. 

3. Proof of Theorem 1.2. First, note that X = Y on the event {<r = 0}. 
Thus, if P{<7 = 0} = 1, there is nothing to prove. If P{<7 = 0} < 1, we can 
condition on the event {a > 0} and consequently assume, without loss of 
generality, that a > almost surely, because the conditioning retains the 
exchangeability of the Xj's. Thus, let us assume that P{<7 > 0} = 1. 

For i = 1, . . . , n let X t = (X* - £)/&. Then Y2=i x i = ® and Ya=i X i = n 
(we will be using these identities numerous times, often without mention). 
Let Zi = Zi - Z, where Z = i Ya=\ Z i- 

In the following, we will use Eo and Po to denote the expectation and 
probability conditional on the pair (/x,<r). Observe that X is a vector with 
exchangeable components under Po for all values of (/*, <r). 

Now assume that (ft, a) is given and fixed. Let 

fo(xi, ...,x n ):= /(/} + <xxi, . . .,fi + ax n ). 

Then /(X) = /o(X) and f(Y) = / (Z). Note that L' r (f ) = L' r (f)a r , where 
L' r (g) denotes a uniform bound on the rth order derivatives of a function g, 
including mixed partials. 

First, we need to do a list of computations. For < i < n, let T\ be 
the sigma-algebra generated by {Xi, . . . , X} and (/t, <r). Since the Xj's are 
exchangeable given (fi,a), we have Eo(Xfc|^ r i-i) = Eo(X|.Fj_i) for every 
k, I > i — 1, and hence 




which is JFj^-measurable. Thus, 

(3) E (X|^_!) = 1 J2 x j = ^rrE^i 



1 i - 1 
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Prom the above identity and exchangeability, it follows that 

1 



(ra-z + 1) 2 



EotEopQl^x) 2 ) = j- ^ 2 [(n - i + l)E (X'f 

+ {n-i + l)(n-i)E (X 1 X 2 )}. 



Now clearly E (X() = E (^ E?=i */) = 1 and 

E (X 1 X 2 ) = 1 ^Eo^Xj 
?7,(n — 1) rr! 



1 " 1 

E E o(^f- 



re(re — 1) f-f 4 re — 1 

v ' i=i 

(The second equality holds because Y^j=i Xj = 0-) Combining, we get 

(4) EoCEo^l^.!) 2 ) = ^ -. 

[n — i + \){n — \) 

Again, by a similar argument as before (using the identity Ya=i Xf ■ 
we have 

1 n 

E °<*<l*-i> = — ^£4- 

3=i 



It follows that 

'■\T, A) = 

(n-i + 1) 



Var (E (X J 2 |^_ 1 )) = - 1 — [(re - i + 1) Var (X 2 ) 



+ (re - i + l)(n - i) Cov (X(,X£)]. 



Now, Varo(Xf) <E (X 1 4 ), and 



Cav (XlXi) = 1 EEo(A >2 A >2 )-Eo(A >2 )E (A > |; 
n(re — 1) tt*! J 

v ' ^3 

1 " 

.^Eo(X 2 (n-X 2 ))-l 



re(re-l)^f 

l-Eo(Xf) w _rv4\ ^ _/v-2\\2 



re — 1 

Combining, we get 



<0, since E (X 1 4 ) > (E (X 2 )) 2 = 1. 



2 _ E (xr, 



(5) Var (E (X/|^_ 1 ))< 



re — i + 1 
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Now let G be the matrix whose (i, j)th element is 

ri/(n-i + l), if»>j, 
9ij = \ 1, if i = j, 

U ifi<j. 

Let R = GX. Then i?j is ^-measurable, and from (3), we see that 
(6) Eo(i2i|^i_i)=0. 
Next, note that 

Eo^l^-i) = EottXi - E (A > i |^_ 1 )) 2 |^_ 1 ) 
= E (X 2 |jF i _ 1 ) - (Eo(Xi|^i_i)) 2 . 
Using (4), (5), the triangle inequality and the fact that Ko(Xf) = 1, we get 
E |E (i?? - 1| < E |E (X i 2 |^_ 1 ) - i| + E (E (X i |.F i _i) 2 ) 



(7) <J E ^ 4 ) + 



n — i + 1 (n — i + l)(n — 1) 



< 2 / E 0(^l) gince E (^4n > L 

We will now temporarily use the notation ||i?i||3 for (Eol-Rjl 3 ) 1 / 3 , which is the 
conditional L 3 norm of By Minkowski's inequality and exchangeability, 
we have 

1 n 

\\Ri\\l < + — -y E MS = 2||x 1 ||§. 

This bound can be rewritten as 

(8) Eol^^Sa^EolXx-Al 3 . 

Now define the function f x :R n -> R as /i(x) := / (G _1 x). Then /i(R) = 
/o(X). Let denote the (i,j')th element of G^ 1 . It is a simple exercise to 
verify that 

(-l/(n-j), ifi>j, 

1 0, if i<j. 

Using the chain rule we see that for any j, r and x, 
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Thus for any r, 

(9) L r {h) < 4(/ ) max \g^\) = L' r (f )2 r = L> r {f)a r 2'\ 

where L r is defined as in the statement of Theorem 1.1. Now let V be a stan- 
dard Gaussian vector in W l . Using the bounds from (6)-(9) in Theorem 1.1, 
we get 

|E /i(R)-E /i(V)| 



1 

< rl2(/l)E E o|Eoft 2 l^i-l) - 1| + 7rnL 3 (/i) max (E |^| 3 + E|V*| 
2 ?~ : D Ki<n 



< 4L' 2 (/)a 2 £ \/ + |nL' 3 (/)a 3 (8^ 3 E |X 1 - /i| 3 + E^ 3 ). 
~ \ n — i + \ b 



Now E|Xi| 4 = (T _4 Eo|Xi — /z| 4 , and by comparing sums with integrals, we 
have £™ =1 (n - i + l)" 1 / 2 < 2-Jn. Thus, 

|E /i(R)-E /i(V)| 



(10) ^SL'^/j^/nEolXx-Al 4 

+ |nL 3 (/)(8E |X 1 -/i| 3 + ( T 3 E|l/ 1 | 3 ). 
Let U = G' 1 V. Explicitly, 

r— ' n — i 

Now note that h(V) = / (U), / (Z) = /(Y) and A(R) = / (X) = /(X). 
Combining, we get 

|E /(X) - E /(Y)| = |E /i(R) - E / (Z)| 

(11) <|E /i(R)-Eo/i(V)| 

+ |E /o(U)-E /o(Z)|. 

We already have a bound on |E /i(R) — E /i(V)| from (10). We will now 
compute a bound on |Eo/o(U) — Eo/o(Z)|, where recall that Zi = Zi — Z, 
and Z is a standard Gaussian vector. To do that, we first need to do some 
computations. Let &ij := Cov(Ui,Uj). Then 

3-1 

-(n-j)" 1 + ^(n-/c)- 2 , if i > J, 
k=l 

3-1 

l + J2(n-k)- 2 , Xi=j, 

k=l 

dji, if i < j . 
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Now, for i > j, we can rewrite the first term in 5jj as a telescoping sum to 
get 



1 



j'-i 



n 



fc=l 

i-i 



1 



1 



re — A: — 1 n — k 



1 



j'-i 



71 



1 ^ (n-k) 2 {n- k- 1) 



Thus, if we define 

(Tij := Cov(Zi,Zj] 



-1/n, 
(n- l)/n, 



then 



EK 



E 



0" X2 C?/ 



n i— 1 



n i— 1 

+ 2EE1 



if Mi: 

if i = j, 



< 



2+EE 



(12) 



n i-1 j— 1 

2 EEE 



1^=1 £l ( n ~ A;) 2 (n-/c-l) 

n— 1 -i n— 2 i Ti—l 

2 + V^ T + V^ T =3 + 2V}. 



, re — ^-J n — fc 

fe=l k=l 



k=2 



We will use the well-known "Gaussian interpolation technique" for bounding 
|^o/o(U) — Eo/o(Z)|. This classical method for proving Slepian-type inequal- 
ities has been used extensively in recent years by Talagrand [19] in his efforts 
to obtain a rigorous version of the cavity method for spin glasses. For each 
t G [0, 1] , let W t = V^^U + y/tZ. Then 



(13) 



E /o(Z)-E /o(U 
- 1 d 



E 



dt 



fo(Wt)dt 



:Ef 



1 " / 7 



Ui 



Jo ~i\2Vt 2y/l-t 



dif (W t )dt 



Now, if a random vector £ = (£i, . . . ,£ n ) has a centered Gaussian distribu- 
tion, then it is not difficult to show using integration by parts that for any 
differentiable function h with subexponential growth at infinity, and any i, 
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the following identity holds: 

n 

»(6^)) = EE(6C-)E(9 j / l (0). 

Since we do not want to expand our list of references, let us refer to Ap- 
pendix A. 6 of Talagrand's book [19] for a proof. Applying this result to 
our problem (after noting that interchanging integrals is not an issue since 
everything is bounded), we get 

n 

E (U i d i f (W t )) = VT^t^Md j d i f Q {W t )) 

i=i 

and similarly, 

n 

i=i 

Combining, we have 




Using the bound from (12), we get 

(14) |E /o(U) - E /o(Z)| < h' 2 (f ) (3 + 2J2 £j • 

Combining this with (10) and (14), we get 

|E/(X)-E/(Y)| 

<E|E /(X)-Eo/(Y)| 

<8L' 2 (/)^/nE|Xi-A| 4 

+ ^(/)(8E|X 1 -/i| 3 + E( ( T 3 )E|y 1 | 3 ) 

To complete the proof, we apply Jensen's inequality to get E(<7 r ) <E|Xi — 
fi\ r for r > 2 and use the crude bounds 3 + 2 Y^k=2 — and E|Vi| 3 < 
1.7 to unify terms. 
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4. Proof of Theorem 1.3. We will now prove Theorem 1.3 via an ap- 
plication of Theorem 1.2, by using the Stieltjes transform of the spectral 
measure as a smooth function of the matrix entries. The Stieltjes transform 
(or Cauchy transform, or resolvent) of a cumulative probability distribution 
function F on R is defined as 

f°° 1 

(15) mpiz) := / dF(x) for every z E 

J — oo Z 

Analogously, the Stieltjes transform of an N x N Hermitian matrix A at a 
number z E C\R is defined as 



(16) m A (z) := ± Tr ((A -zl)- 1 ), 

where I is the identity matrix of order N. Note that this is just the Stielt- 
jes transform of the empirical spectral distribution (ESD) of A. The ESDs 
of a sequence {A^}^ =1 of random Hermitian matrices converge weakly in 
probability to a distribution F if and only if 

m,A N {z) — ► mp(z) for every z E C\R. 

For the proof of this result and further details like Berry-Esseen-type error 
bounds, we refer to [2], pages 639-640. 

Now recall that if A{x) is a matrix-valued differentiable function of a 
scalar x, and G(x) := (A(x) — zl) -1 , where z E C\R and / is the identity 
matrix, then 

(17) ^ = ~G^G. 

ax ax 

This standard result is obtained by differentiating both sides of the iden- 
tity G(A — zl) = 1. Differentiability follows from the fact that the elements 
of the inverse of a matrix are all rational functions of the elements of the 
original matrix. Higher-order derivatives may be computed by repeatedly 
applying the above formula. 

The following lemma is the key to the proof of Theorem 1.3: 

Lemma 4.1. Suppose that for each N we have a random matrix An = 
(iV -1 / 2 ' X?!j)\<i ) j<]si , where the collection {X.fj)\<%<j<N is exchangeable and 
X-j = Xj[ for i> j . Suppose 



2 V > ~ AT , 2 



D *« =0 — Win £ w?) 2 



a.s. 



For each N, let {Zj\j)i<i<j<N be a collection of i.i.d. standard Gaussian 
random variables and let = Zfj — Z N , where Z N = N ^+i) Si<j Zfj ■ 
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Let Y$ = Yg for i > j, and let B N = (N'^Yff )i<i,j< N - Then, for any 
z £ C\R, and any g : R — > R mi/i bounded derivatives up to the third order, 
we have 

\Eg(Re7n AN (z)) -Eg(Rem BN (z))\ 

< CiN- 1 (E\X^\ 1 ) 1 / 2 + C 2 N- 1 / 2 E\X* 2 \\ 

where m is the Stieltjes transform as defined in (16) and C\ and Ci are con- 
stants depending only on g and z. The quantity \Kg(Imm^ (z)) — Mg(ImmB N ( 
also admits the same upper bound. 

To complete the proof of Theorem 1.3 using this lemma, we need the 
following fact about spectral distributions of Hermitian matrices: 

Lemma 4.2 (Quoted from [2], Lemma 2.2). Let A and B be two N x N 
Hermitian matrices, with empirical distribution functions Fa and F B ■ Then 

\\F A -F B \\oc<-^rank(A- B). 

This lemma is an easy consequence of the well-known interlacing inequal- 
ities for eigenvalues of Hermitian matrices. Let us now complete the proof 
of Theorem 1.3 by combining Lemma 4.1 and Lemma 4.2. 

Proof of Theorem 1.3. Let X$ = (X$ - £in)/o~n, and let A N = 
(iV~ 1//2 X^)i<jj<7v- Clearly, An satisfies the hypotheses of Lemma 4.1. 
Thus, we have 

\Eg(Rem AN (z)) - Eg(Rem BN (z))\ 

< CiiV-^Ell^l 4 ) 1 / 2 + C 2 N~ l / 2 E\X? 2 \ 3 . 

The same bound holds for \Eg(Imm A (z)) — E(j(Immg JY (z))| as well. The 

bound converges to zero if E|X^| 4 = o(iV 2 / 3 ), and thus under that condition, 
An and Bjy must have the same LSD. Finally, observe that by Lemma 4.2, 
the sequence {& N 1 A^} has the same LSD as {^4tv}, and F Bn converges 
weakly to the semicircle distribution in probability. This completes the proof. 
□ 



Proof of Lemma 4.1. To formalize things in a way that is suitable 
for our purpose, consider the map A which "constructs" Wigner matrices of 
order N . Let n = N(N + l)/2 and write elements of R n as x = {xij)\<i<j<N ■ 
For any x E R n , let -A(x) be the matrix defined as 



(18) A(x) 



N-Wxji, if i > j. 
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Now let us fix z = u + \^lv G C, with v / 0. Let G(x) := (A(x) - zl)" 1 , 
and define /i:R n ^l as 

/i(x) :=JV _1 Tt(G(x)). 

For any a G {(i, j)}i<i<j< n , we will write <9 a /i for dh/dx a by our usual 
convention. From (17), it follows that for any a, 

(19) d a h = -N- l Ti{G{d a A)G). 

Now note that for any a,j3& {(h j)}i<i<j<n, we have dpd a A = 0. An easy 
computation involving repeated applications of (17) to the above expression 
for d a h gives, for any a, (3,~f G {(£, j)}i<j<j<7v, 

(20) d p d a h = N~ 1 J2 Tr(G(dp,A)G(d a ,A)G), 

{(3',a'}={(3,a] 

(21) d 7 dpd a h = -N~ 1 ^ Tr (G (Sy A) G (9^ A) G (d a > A) G) . 

{y',P',a'}={y,l3,a} 

Note that the first sum runs over all permutations of (/5, a), which amounts 
to only two terms. Similarly, the second sum involves six terms. 

To bound (19), first note that Tt(G(d a A)G) = Tr((d a A)G 2 ). Since G 2 
has a spectral decomposition and all its eigenvalues are bounded by \v\~ 2 in 
magnitude, it follows in particular that the elements of G 2 are also bounded 
by \ v \ ~ 2 ■ Now, d a A has at most two nonzero elements, which are equal 
to iV" 1 / 2 . Hence, 

| Ti(G(d a A)G)\ = | Tr({d a A)G 2 )\ < 2\v\- 2 N- 1 ' 2 . 

To bound (20) and (21), we need to recall the properties of the Hilbert- 
Schmidt norm for matrices. For an N x N complex matrix B = (bij)i<ij<N-> 
the Hilbert-Schmidt norm of B is defined as ||J3|| := {Y.i,j\bij\ 2 ) 1/2 ■ Be- 
sides the usual properties of a matrix norm, it has the following additional 
features: (a) |Tr(£>C)| < ||S||||C||, (b) if U is a unitary matrix, then for 
any C of the same order, ||Ci7|| = ||£7C|| = ||C|| , and (c) for a matrix B 
admitting a spectral decomposition (i.e., a normal matrix) with eigenvalues 
Ai, . . . , Aat, and any other matrix C of the same order, max{||5C||, ||CB||} < 
maxi<j<jv |Aj| • ||C||. For a proof of these standard facts one can look up, for 
example, [22], pages 55-58. 

Clearly, G and the derivatives of A are all normal matrices. Moreover, 
the eigenvalues of G are bounded by H" 1 , where v = Im z. Thus, by the 
properties of the Hilbert-Schmidt norm listed above, we have 

\Tv{G{d p A)G{d a A)G)\ < \\G(d p A)\\\\G(d a A)G\\ 

<\v\- 3 \\dpA\\\\d a A\\ 

< 2ivr 3 jv -1 . 
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Similarly, 

\Tr(G(d^A)G(d p A)G(d a A)G)\ < \\G(d y A)\\\\G(df,A)G(d a A)G\\ 

<\\G(d,A)\\\\G(dpA)G\\\\(d a A)G\\ 
< \v\- 4 \\d y A\\\\dpA\\\\d a A\\ 
<2 3 />|" 4 iV- 3 / 2 . 

Finally, note that since the matrix entries are all real, therefore <9 Q (Re/i) = 
Re d a h and so on. Thus, if we let / = g o (Re h) , then substituting the bounds 
obtained above in (19), (20) and (21), we get L' 2 (f) < K^^ 2 and L' 3 (f) < 
K 2 N~ 5 / 2 , where K\ and K 2 are constants depending only on g and z. By 
Theorem 1.2, it now follows that 

|E/(X) - E/(Y)| < 9.5^i N- l {^\X l2 \ A ) 1/2 + 13i-C 2 iV~ 1/2 IE| A\ 2 | 3 , 

where Yij = Zij — Z, and Z^s are i.i.d. standard Gaussian random variables. 
This completes the proof of the lemma. □ 
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